36 research outputs found

    Android Malware Family Classification Based on Resource Consumption over Time

    Full text link
    The vast majority of today's mobile malware targets Android devices. This has pushed the research effort in Android malware analysis in the last years. An important task of malware analysis is the classification of malware samples into known families. Static malware analysis is known to fall short against techniques that change static characteristics of the malware (e.g. code obfuscation), while dynamic analysis has proven effective against such techniques. To the best of our knowledge, the most notable work on Android malware family classification purely based on dynamic analysis is DroidScribe. With respect to DroidScribe, our approach is easier to reproduce. Our methodology only employs publicly available tools, does not require any modification to the emulated environment or Android OS, and can collect data from physical devices. The latter is a key factor, since modern mobile malware can detect the emulated environment and hide their malicious behavior. Our approach relies on resource consumption metrics available from the proc file system. Features are extracted through detrended fluctuation analysis and correlation. Finally, a SVM is employed to classify malware into families. We provide an experimental evaluation on malware samples from the Drebin dataset, where we obtain a classification accuracy of 82%, proving that our methodology achieves an accuracy comparable to that of DroidScribe. Furthermore, we make the software we developed publicly available, to ease the reproducibility of our results.Comment: Extended Versio

    SAFE: Self-Attentive Function Embeddings for Binary Similarity

    Get PDF
    The binary similarity problem consists in determining if two functions are similar by only considering their compiled form. Advanced techniques for binary similarity recently gained momentum as they can be applied in several fields, such as copyright disputes, malware analysis, vulnerability detection, etc., and thus have an immediate practical impact. Current solutions compare functions by first transforming their binary code in multi-dimensional vector representations (embeddings), and then comparing vectors through simple and efficient geometric operations. However, embeddings are usually derived from binary code using manual feature extraction, that may fail in considering important function characteristics, or may consider features that are not important for the binary similarity problem. In this paper we propose SAFE, a novel architecture for the embedding of functions based on a self-attentive neural network. SAFE works directly on disassembled binary functions, does not require manual feature extraction, is computationally more efficient than existing solutions (i.e., it does not incur in the computational overhead of building or manipulating control flow graphs), and is more general as it works on stripped binaries and on multiple architectures. We report the results from a quantitative and qualitative analysis that show how SAFE provides a noticeable performance improvement with respect to previous solutions. Furthermore, we show how clusters of our embedding vectors are closely related to the semantic of the implemented algorithms, paving the way for further interesting applications (e.g. semantic-based binary function search).Comment: Published in International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment (DIMVA) 201

    Advanced chondrosarcoma of the pelvis: a rare case of urinary obstruction

    Get PDF
    Chondrosarcoma is the second most common malignant tumor of the bone with an incidence of 1 in 200.000 per year. Axial skeleton is frequently involved showing poorer oncological outcomes than appendicular one: human pelvis is a site predilection. It is rarely associated to urinary obstruction but according to its localization, it can be frequently linked to compression of pelvic organs as bladder, prostate or bowel. We describe the case of a 52 years old caucasian male with history of advanced pelvic chondrosarcoma and severe hydronephrosis due to total bladder dislocation

    Triage of IoT Attacks Through Process Mining

    Get PDF
    The impressive growth of the IoT we witnessed in the recent years came together with a surge in cyber attacks that target it. Factories adhering to digital transformation programs are quickly adopting the IoT paradigm and are thus increasingly exposed to a large number of cyber threats that need to be detected, analyzed and appropriately mitigated. In this scenario, a common approach that is used in large organizations is to setup an attack triage system. In this setting, security operators can cherry-pick new attack patterns requiring further in-depth investigation from a mass of known attacks that can be managed automatically. In this paper, we propose an attack triage system that helps operators to quickly identify attacks with unknown behaviors, and later analyze them in detail. The novelty introduced by our solution is in the usage of process mining techniques to model known attacks and identify new variants. We demonstrate the feasibility of our approach through an evaluation based on three well-known IoT botnets, BASHLITE, LIGHTAIDRA and MIRAI, and on real current attack patterns collected through an IoT honeypot

    Function Representations for Binary Similarity

    Get PDF
    The binary similarity problem consists in determining if two functions are similar considering only their compiled form. Advanced techniques for binary similarity recently gained momentum as they can be applied in several fields, such as copyright disputes, malware analysis, vulnerability detection, etc. In this paper we describe SAFE, a novel architecture for function representation based on a self-attentive neural network. SAFE works directly on disassembled binary functions, does not require manual feature extraction, is computationally more efficient than existing solutions, and is more general as it works on stripped binaries and on multiple architectures. Results from our experimental evaluation show how SAFE provides a performance improvement with respect to previoussolutions. Furthermore, we show how SAFE can be used in widely different use cases, thus providing a general solution for several application scenarios

    How Decoding Strategies Affect the Verifiability of Generated Text

    Get PDF
    Recent progress in pre-trained language models led to systems that are able to generate text of an increasingly high quality. While several works have investigated the fluency and grammatical correctness of such models, it is still unclear to which extent the generated text is consistent with factual world knowledge. Here, we go beyond fluency and also investigate the verifiability of text generated by state-of-the-art pre-trained language models. A generated sentence is verifiable if it can be corroborated or disproved by Wikipedia, and we find that the verifiability of generated text strongly depends on the decoding strategy. In particular, we discover a tradeoff between factuality (i.e., the ability of generating Wikipedia corroborated text) and repetitiveness. While decoding strategies such as top-k and nucleus sampling lead to less repetitive generations, they also produce less verifiable text. Based on these finding, we introduce a simple and effective decoding strategy which, in comparison to previously used decoding strategies, produces less repetitive and more verifiable text.Comment: accepted at Findings of EMNLP 202

    How Decoding Strategies Affect the Verifiability of Generated Text

    Get PDF
    Recent progress in pre-trained language models led to systems that are able to generate text of an increasingly high quality. While several works have investigated the fluency and grammatical correctness of such models, it is still unclear to which extent the generated text is consistent with factual world knowledge. Here, we go beyond fluency and also investigate the verifiability of text generated by state-of-the-art pre-trained language models. A generated sentence is verifiable if it can be corroborated or disproved by Wikipedia, and we find that the verifiability of generated text strongly depends on the decoding strategy. In particular, we discover a tradeoff between factuality (i.e., the ability of generating Wikipedia corroborated text) and repetitiveness. While decoding strategies such as top-k and nucleus sampling lead to less repetitive generations, they also produce less verifiable text. Based on these finding, we introduce a simple and effective decoding strategy which, in comparison to previously used decoding strategies, produces less repetitive and more verifiable text

    Association of kidney disease measures with risk of renal function worsening in patients with type 1 diabetes

    Get PDF
    Background: Albuminuria has been classically considered a marker of kidney damage progression in diabetic patients and it is routinely assessed to monitor kidney function. However, the role of a mild GFR reduction on the development of stage 653 CKD has been less explored in type 1 diabetes mellitus (T1DM) patients. Aim of the present study was to evaluate the prognostic role of kidney disease measures, namely albuminuria and reduced GFR, on the development of stage 653 CKD in a large cohort of patients affected by T1DM. Methods: A total of 4284 patients affected by T1DM followed-up at 76 diabetes centers participating to the Italian Association of Clinical Diabetologists (Associazione Medici Diabetologi, AMD) initiative constitutes the study population. Urinary albumin excretion (ACR) and estimated GFR (eGFR) were retrieved and analyzed. The incidence of stage 653 CKD (eGFR < 60 mL/min/1.73 m2) or eGFR reduction > 30% from baseline was evaluated. Results: The mean estimated GFR was 98 \ub1 17 mL/min/1.73m2 and the proportion of patients with albuminuria was 15.3% (n = 654) at baseline. About 8% (n = 337) of patients developed one of the two renal endpoints during the 4-year follow-up period. Age, albuminuria (micro or macro) and baseline eGFR < 90 ml/min/m2 were independent risk factors for stage 653 CKD and renal function worsening. When compared to patients with eGFR > 90 ml/min/1.73m2 and normoalbuminuria, those with albuminuria at baseline had a 1.69 greater risk of reaching stage 3 CKD, while patients with mild eGFR reduction (i.e. eGFR between 90 and 60 mL/min/1.73 m2) show a 3.81 greater risk that rose to 8.24 for those patients with albuminuria and mild eGFR reduction at baseline. Conclusions: Albuminuria and eGFR reduction represent independent risk factors for incident stage 653 CKD in T1DM patients. The simultaneous occurrence of reduced eGFR and albuminuria have a synergistic effect on renal function worsening

    Unsupervised Features Extraction for Binary Similarity Using Graph Embedding Neural Networks

    No full text
    In this paper we consider the binary similarity problem that consists in determining if two binary functions are similar only considering their compiled form. This problem is know to be crucial in several application scenarios, such as copyright disputes, malware analysis, vulnerability detection, etc. The current state-of-the-art solutions in this field work by creating an embedding model that maps binary functions into vectors in . Such embedding model captures syntactic and semantic similarity between binaries, ie, similar binary functions are mapped to points that are close in the vector space. This strategy has many advantages, one of them is the possibility to precompute embeddings of several binary functions, and then compare them with simple geometric operations (eg, dot product). In [32] functions are first transformed in Annotated Control Flow Graphs (ACFGs) constituted by manually engineered features and then graphs are embedded into vectors using a deep neural network architecture. In this paper we propose and test several ways to compute annotated control flow graphs that use unsupervised approaches for feature learning, without incurring a human bias. Our methods are inspired after techniques used in the natural language processing community (eg, we use word2vec to encode assembly instructions). We show that our approach is indeed successful, and it leads to better performance than previous state-of-the-art solutions. Furthermore, we report on a qualitative analysis of functions embeddings. We found interesting cases in which embeddings are clustered according to the semantic of the original binary function
    corecore